Review of Logistic Regression

Logistic Regression for B

\[spam \sim log(num\_char)\]

m1 <- glm(spam ~ log(num_char), data = email, family = "binomial")
summary(m1)
## 
## Call:
## glm(formula = spam ~ log(num_char), family = "binomial", data = email)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.815  -0.467  -0.335  -0.255   3.013  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -1.7244     0.0606   -28.4   <2e-16 ***
## log(num_char)  -0.5435     0.0365   -14.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2437.2  on 3920  degrees of freedom
## Residual deviance: 2190.3  on 3919  degrees of freedom
## AIC: 2194
## 
## Number of Fisher Scoring iterations: 5

Interpreting Logistic Regression

  1. Each row of the summary output is still a hypothesis test on that parameter being 0.
  2. A positive slope estimate indicates that there is a positive association between that particular explanatory variable and the response.
  3. Each estimate is still conditional on the other variables held constant.

Transforming from the logit (Simple case)

The logistic model approximates the \(ln(odds)\) using a linear predictor \(\beta_0 + \beta_1 X\). If we know \(ln(P/(1-P)) = \beta_0 + \beta_1 X\), what is the formula for \(P\)?

  1. To go from \(ln (odds)\) to \(odds\), we use the exponential function \(e^x\): \(odds = e^{ln(odds)}\)

  2. You can check that if \(odds = P / (1 - P)\), then solving for \(P\) gives \(P = odds/ (1 + odds)\).

  • Putting 1 and 2 together gives \[ P = \dfrac{e^{ln(odds)}}{1 + e^{ln(odds)}} = \dfrac{e^{\beta_0 + \beta_1 X}}{1 + e^{\beta_0 + \beta_1 X}} = \dfrac{1}{1 + e^{-(\beta_0 + \beta_1 X)}}. \]
  • The function \(f(y) = 1/ (1 + e^{-y})\) is commonly known as the logistic function, thus, giving the name to logistic regression.

A more sophisticated model

m2 <- glm(spam ~ log(num_char) + to_multiple + attach + dollar + inherit + 
            viagra, data = email, family = "binomial")
summary(m2)
## 
## Call:
## glm(formula = spam ~ log(num_char) + to_multiple + attach + dollar + 
##     inherit + viagra, family = "binomial", data = email)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -1.931  -0.455  -0.328  -0.236   3.034  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -1.59642    0.06443  -24.78  < 2e-16 ***
## log(num_char) -0.54869    0.03831  -14.32  < 2e-16 ***
## to_multiple   -1.92889    0.30493   -6.33  2.5e-10 ***
## attach         0.19970    0.06552    3.05   0.0023 ** 
## dollar        -0.00456    0.01898   -0.24   0.8102    
## inherit        0.40003    0.15166    2.64   0.0083 ** 
## viagra         1.73607   40.59296    0.04   0.9659    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2437.2  on 3920  degrees of freedom
## Residual deviance: 2106.3  on 3914  degrees of freedom
## AIC: 2120
## 
## Number of Fisher Scoring iterations: 11

Extending the model

A GLM consists of three things:

  1. A linear combination of predictor variables
  2. A distribution of the response variable
  3. A link function between the two
  • MLR : Normal distribution, identity link function

  • Logistic Regression : Binomial distribution, logit link function

  • Poisson Regression : Poisson distribution, logarithm link function

Bayesian Inference

Please draw your own subjective distributions for the following events.

  1. The probability that Hillary Clinton will win the presidency.
  2. The probability that, on a given night, the sun has gone supernova.
  3. The total number of individual socks that you own.

Karl Broman's Socks

Classical Hypothesis Test

Assert a model

  • \(H_0\): I have \(N_{pairs}\) pairs of socks and \(N_{singles}\) singletons leading to \(N = 2 \cdot N_{pairs} + N_{singles}\). The first 11 socks that I pull out of the machine is a random sample from this population.

Decide on a test statistic

  • The number of singletons in the sample: 11.

Construct the sampling distribution

  • Probability theory or simulation.

See where your observed stat lies in that distribution

  • Find the \(p\)-value if you like.

\(H_0\)

\[N_{pairs} = 9; \quad N_{singles} = 5\]

Contructing the sampling dist.

We'll use simulation.

Create the population of socks:

sock_pairs <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K")
sock_singles <- c("l", "m", "n", "o", "p")
socks <- c(rep(sock_pairs, each = 2), sock_singles)
socks
##  [1] "A" "A" "B" "B" "C" "C" "D" "D" "E" "E" "F" "F" "G" "G" "H" "H" "I" "I" "J"
## [20] "J" "K" "K" "l" "m" "n" "o" "p"

One draw from the machine

picked_socks <- sample(socks, size = 11, replace = FALSE); picked_socks
##  [1] "C" "K" "G" "l" "I" "o" "m" "F" "E" "D" "B"
sock_counts <- table(picked_socks); sock_counts
## picked_socks
## B C D E F G I K l m o 
## 1 1 1 1 1 1 1 1 1 1 1
n_singles <- sum(sock_counts == 1); n_singles
## [1] 11

Our simulator

Constructing the sampling dist.

pick_socks(N_pairs = 9, N_singles = 5, N_pick = 11)
## [1] 9
pick_socks(9, 5, 11)
## [1] 7
pick_socks(9, 5, 11)
## [1] 7

Repeat many, many times…

The sampling distribution

The sampling distribution

The \(p\)-value

Quantifying how far into the tails our observed count was.

table(sim_singles)
## sim_singles
##   1   3   5   7   9  11 
##   2  48 248 411 250  41
table(sim_singles)[6]/1000
##    11 
## 0.041

Our two-tailed p-value is 0.082.

Question

What is the best definition for our \(p\)-value in probability notation?

  1. \(\mathbb{P}\)(\(H_0\) is true | data or more extreme) = 0.041
  2. \(\mathbb{P}\)(\(H_0\) is true) = 0.041
  3. \(\mathbb{P}\)(data or more extreme | \(H_0\) is true) = 0.041
  4. \(\mathbb{P}\)(data) = 0.041

Question

What is the best definition for our p-value in probability notation?

  1. \(\mathbb{P}\)(\(H_0\) is true | data or more extreme) = 0.041
  2. \(\mathbb{P}\)(\(H_0\) is true) = 0.041
  3. \(\mathbb{P}\)(data or more extreme| \(H_0\) is true) = 0.041
  4. \(\mathbb{P}\)(data) = 0.041

The challenges with the classical method

The result of a hypothesis test is a probability of the form:

\[ \mathbb{P}(\textrm{ data or more extreme } | \ H_0 \textrm{ true }) \]

while most people think they're getting

\[ \mathbb{P}(\ H_0 \textrm{ true } | \textrm{ data or more extreme}) \]

How can we go from the former to the latter?

What we have

What we want

Bayesian Modeling

Bayes Rule

\[\mathbb{P}(A \ | \ B) = \frac{\mathbb{P}(A \textrm{ and } B)}{\mathbb{P}(B)} \]

\[\mathbb{P}(A \ | \ B) = \frac{\mathbb{P}(B \ | \ A) \ \mathbb{P}(A)}{\mathbb{P}(B)} \]

\[\mathbb{P}(model \ | \ data \, or \, more \, extreme) = \frac{\mathbb{P}(data \, or \, more \, extreme\ | \ model) \ \mathbb{P}(model)}{\mathbb{P}(data \, or \, more \, extreme)} \]

What does it mean to think about \(\mathbb{P}(model)\)?

Please draw your own subjective distributions for the following events.

  1. The probability that Hillary Clinton will win the presidency.
  2. The probability that, on a given night, the sun has gone supernova.
  3. The total number of individual socks that you own.

Prior distribution

A prior distribution is a probability distribution for a parameter that summarizes the information that you have before seeing the data.

Prior on proportion pairs

Full simulation

head(sock_sim, 3)
##   unique pairs n_socks prop_pairs
## 1      3     4      16      0.970
## 2      7     2      33      0.914
## 3      9     1      51      0.929
sock_sim %>%
  filter(unique == 11, pairs == 0) %>%
  head(3)
##   unique pairs n_socks prop_pairs
## 1     11     0      49      0.692
## 2     11     0      37      0.873
## 3     11     0      49      0.815

Proportion of pairs

Number of socks